Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting
نویسندگان
چکیده
With the expanding development of on-device artificial intelligence, voice-enabled devices such as smart speakers, wearables, and other or edge processing systems have been proposed. However, building obtaining large training datasets that are essential for robust keyword spotting (KWS) remains cumbersome. To address this problem, we propose a deep neural network can rapidly establish high-performance KWS system from arbitrary instruction sets. We use an encoder pretrained with large-scale speech corpus backbone then design effective transfer KWS. demonstrate feasibility proposed network, various experiments were conducted on Google Speech Command Datasets V1 V2. In addition, to verify applicability different languages, using three Korean command datasets. The outperforms state-of-the-art networks in both experiments. Furthermore, understand real human voice even when trained synthetic text-to-speech data.
منابع مشابه
Semantic keyword spotting by learning from images and speech
We consider the problem of representing semantic concepts in speech by learning from untranscribed speech paired with images of scenes. This setting is relevant in low-resource speech processing, robotics, and human language acquisition research. We use an external image tagger to generate soft labels, which serve as targets for training a neural model that maps speech to keyword labels. We int...
متن کاملTopic recognition for news speech based on keyword spotting
This paper describes topic identi cation for Japanese TV news speech based on the keyword spotting technique. Three thousands of nouns are selected as keywords which contribute to topic identi cation, based on criterion of mutual information and a length of the word. This set of the keywords identi ed the correct topic for 76.3% of articles from newspaper text data. Further, we performed keywor...
متن کاملComparison of keyword spotting methods for searching in speech
This paper presents and discusses keyword spotting methods for searching in speech. In contrast with searching in text, the searching in speech or generally in multimedia data still represents a challenge. The aim of the paper is to present a keyword spotting (KWS) method based on a large vocabulary continuous speech recognition (LVCSR) system, based on phonetics decoder, and keyword spotting u...
متن کاملComparison of keyword spotting approaches for informal continuous speech
This paper describes several approaches to keyword spotting (KWS) for informal continuous speech. We compare acoustic keyword spotting, spotting in word lattices generated by large vocabulary continuous speech recognition and a hybrid approach making use of phoneme lattices generated by a phoneme recognizer. The systems are compared on carefully defined test data extracted from ICSI meeting dat...
متن کاملDeep Residual Learning for Small-Footprint Keyword Spotting
We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark. Our best residual network (ResNet) implementation significantly outperforms Google’s previous convolutional neural networks in terms of accuracy. By varying model depth and width, we can achieve compact models th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3078715